Method And System For Optimal Video Transcoding Based On Utility Function Descriptors

ABSTRACT

Techniques for generating utility-based descriptors from compressed multimedia information are disclosed. A preferred method includes the steps of receiving least a segment of compressed multimedia information, determining two or more portions of utility based descriptor information based on one or more adaptation operations, each corresponding to a unique target rate, adapting the compressed multimedia segment by each the portions of utility based descriptor information to generate adapted multimedia segments, using a quality management method to generate measurement for each adapted multimedia segment, and generating a utility based descriptors based on the portions of utility based descriptor information and corresponding quality measurements.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is based on U.S. provisional patent applications Ser.No. 60/376,129, filed Apr. 26, 2002, and No. 60/384,939, filed May 31,2002, which are incorporated herein by reference for all purposes andfrom which priority is claimed.

BACKGROUND OF THE INVENTION

1. Technical Field

The present invention relates to techniques for delivering multimediacontent across a network, and more specifically, to techniques fortransparently and adaptively transporting multimedia content across awide range of networks.

2. Background Art

At the dawn of the 21st century, the Internet has achieved widespreaduse among businesses and consumers in the exchange of all forms ofmultimedia information. Graphic art, text, audio, video and other formsof information are continuously shared among users. In order to reducebandwidth requirements to manageable levels, multimedia information isoften stored and transported in the form of compressed bitstreams thatare in a standard format. For example, in the case of audiovisualinformation, JPEG, Motion JPEG, MPEG-1, MPEG-2, MPEG-4, H.261 and H.263are in widespread use. Unfortunately, while a multitude of differingtypes of standardized multimedia content have been developed and madeavailable on the Internet, there presently exists no standard way tocontrol the access, delivery, management and protection for suchcontent. Recognizing this need, the Motion Picture Experts Group(“MPEG”) has recently commenced the MPEG-21 Multimedia Frameworkinitiative in order to develop a solution. As further described inInternational Organisation for Standardisation (“ISO”) document ISO/IECJTC1/SC29WG11/N5231 (2002), one of the goals of MPEG-21 is develop atechnique for delivering different types of content in an integrated andharmonized way, so that the content delivery process is entirelytransparent to a wide spectrum of multimedia users.

In order to accomplish such a technique, part 7 of MPEG-7 proposes theconcept of what is called “Digital Item Adaptation.” That conceptinvolves the adaptation of resources and descriptions that constitute adigital item to achieve interoperable transparent access to universalmultimedia from any type of terminal and network. By implementingDigital Item Adaptation, users. in a network would be unaware of networkand terminal-specific issues that often affect the delivery ofmultimedia content, such as network congestion, quality limitations, andreliability of service. It is envisioned that a diverse community ofusers will therefor be able to share a multimedia experience, each tohis or her individual acceptable level of quality.

Probably transcoding, which avoids the need to store content indifferent compressed formats for different network bandwidths anddifferent terminals, is one of the most common methods of resourceadaptation. In MPEG-7, so called Transcoding Hints have been proposed inorder to enable better transcoding by reducing computation complexitywhile preserving quality as much as possible.

Unfortunately, the proposed MPEG-7 Transcoding Hints do not provideinformation about feasible transcoding operators and their expectedperformance in order to meet specific target rates. They likewise do notprovide a solution that may be useful to fulfill the multiplerequirements necessary to ensure a transparent, adaptive multimediacontent delivery. Accordingly, there remains a need for a technique fordelivering multiple types of multimedia content over a network to a widespectrum of multimedia users having different acceptable levels ofquality.

SUMMARY OF THE INVENTION

An object of the present invention is to provide a technique fordelivering multiple types of multimedia content over a network to a widespectrum of multimedia users having different acceptable levels ofquality.

Another object of the present invention is to provide multimedia contentdescription techniques that are useful to fulfill several requirements.

In order to meet these. and other objects of the present invention,which will become apparent with reference to further disclosure setforth below, the present invention provides techniques for generatingutility-based descriptors from compressed multimedia information. Apreferred method includes the steps of receiving least a segment ofcompressed multimedia information, determining two or more portions ofutility based descriptor information based on one or more adaptationoperations, each corresponding to a unique target rate, adapting thecompressed multimedia segment by each the portions of utility baseddescriptor information to generate adapted multimedia segments, using aquality management method to generate a quality measurement for eachadapted multimedia segment, and generating a utility based descriptorbased on the portions of utility based descriptor information andcorresponding quality measurements.

In a preferred embodiment, the compressed multimedia information isMPEG-4 data, and from ten to twenty portions of utility based descriptorinformation are utilized. The portions of utility based descriptorinformation may be uniformly or non-uniformly sampled. Advantageously,the adaptation operations may include frame dropping, either by droppingfirst B frames or all B frames, and may further include coefficientdropping.

In another embodiment, the present invention provides systems andmethods for delivering compressed multimedia information to two or moreusers, each having different target bit rates. In one arrangement, amethod includes the steps of receiving at least a segment of compressedmultimedia information and a corresponding utility based descriptor,parsing the utility based descriptor into portions, each correspondingto a unique target bit rate for each of the users, selecting a utilitybased descriptors portion that corresponds to the unique target bit ratefor each user, and adapting the compressed multimedia segment by theselected utility based descriptors portion for each user. Target bitrate feedback information from the users, or from the network, may beutilized in the adaptation step.

The accompanying drawings, which are incorporated and constitute part ofthis disclosure, illustrate preferred embodiments of the invention andserve to explain the principles of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is functional diagram showing the relationships among adaptationspage, utility space, and resource space;

FIG. 2 is a block diagram of an exemplary system in accordance with thepresent invention;

FIG. 3 is an illustrative diagram showing a two dimensional adaptationspace defined by a combination of frame dropping and coefficientdropping;

FIG. 4 is a graph showing an exemplary utility function in accordancewith the present invention;

FIGS. 5( a)-(c) are graphs showing variations of the exemplary utilityfunction shown in FIG. 4;

FIG. 6 is a schematic diagram of an exemplary utility based descriptiontool in accordance with the present invention; and

FIG. 7 is a schematic diagram of an exemplary utility-based descriptorin accordance with the present invention.

Throughout the Figs., the same reference numerals and characters, unlessotherwise stated, are used to denote like features, elements, componentsor portions of the illustrated embodiments. Moreover, while the presentinvention will now be described in detail with reference to the figures,it is done so in connection with the illustrative embodiments.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Referring to FIG. 1, an exemplary embodiment of the present inventionwill be described. A utility-based framework provides a systematic wayof efficient video adaptation by modeling the relationships amongessential parameters: adaptation operations, resources, and utility. Ingeneral, adaptation operations can take the form of spatial domainadaptation, temporal adaptation, or object-based adaptation. Spatialdomain adaptation may include spatial resolution reduction and qualityor signal-to-noise adaptation, such as requantization or DCT coefficientdropping. Temporal domain adaptation may include frame dropping, andobject-based adaptation may include video object prioritization and/ordropping. A particular operation defined by any of such adaptationmethods is referred to herein as an adaptation operation.

Resources include available support from terminal devices and networkcapabilities like bandwidth, computation capability, power, and displaysize, etc. Utility includes the quality of content resulted from aparticular adaptation operation. Utility can be measured in an objectivemanner, such as by determining the peak signal-to-noise ratio (“PSNR”),or a subjective one, e.g., by use of a subjective quality score. FIG. 1illustrates the multi-dimensional space of adaptation, resources, andutility and the relations among them as applied to MPEG-4 compressedvideo.

The adaptation space 110 represents the conceptual space of all possibleadaptation operations for one or more selected adaptation methods. Eachdimension of the adaptation space represents one type of adaptationmethod and has a certain cardinal index representing associatedadaptation operations. For example, where frame dropping and coefficientdropping are both utilized, there are two dimensions in the adaptationspace: frame dropping and coefficient dropping. The dimension of framedropping can be indexed by the amount of frames dropped, e.g., nodropping, all B frames dropped in a sub-Group of Pictures (“GOP”) (asub-GOP includes a set of successive frames beginning with an I or Pframe and continuing to the next I or P frame), all B and P framesdropped in each GOP. The coefficient dropping dimension can be indexedby the percentage of rate-reduction to be achieved by coefficientdropping, e.g., no dropping, 10%, 20%, etc. In this way, a set ofdiscrete points in the adaptation space can be defined, each pointrepresenting an adaptation operation specified by a particularcombination of frame dropping and coefficient dropping.

In some applications, the resource limitation may include several typesof resources. For example, in order to provide video-streaming serviceto certain handheld devices, factors such as spatial resolution orcomputational capability should also be taken into account along withbandwidth. In general, all types of resources to be satisfied arerepresented by a multidimensional resource space. Utility space mayinclude attributes in multiple dimensions. In addition to PSNR, thesubjective preference like mean opinion scale (“MOS”), temporalsmoothness may be included in other dimensions together.

Referring again to FIG. 1, a video segment 101 is a unit undergoingadaptation, with each point representing a particular adaptationoperation in the adaptation space. The adapted video segment has theresulting values of resources and utilities represented as correspondingpoints in the resource and utility spaces, respectively. The shaded cubein the resource space represents the resource constraints specified byapplications. Note that there may exist multiple adaptation operationsthat satisfy the same resource requirement. The oval shaped region inthe adaptation space that is mapped into a point in the resource spaceshows such a constant-resource set. Also, different adaptation operatorsmay result in the same utility value. The rectangular region in theadaptation space represents such a constant-utility set.

Using the utility-based framework, video adaptation can be formulated asfollows: given certain resource constraints, determine the optimaladaptation operation so that the utility of the adapted video ismaximized. Since most adaptation problems likely to be assumed in theUMA paradigm can be formulated so formulated, such resource-constrainedutility maximization may be considered to be a basic scenario ofmultimedia adaptation. While the disclosure herein is directed tooptimizing a frame and coefficient dropping transcoding to satisfyavailable bandwidth as an example of resource-constrained utilitymaximization, those skilled in the art should appreciate that theutility-based framework of the present invention can readily includeconstraints in the utility space and aim at overall resourceminimization.

Referring next to FIG. 2 a system in accordance with the presentinvention will now be described. A server computer 210 is adapted toreceive stored video 211 and/or live video 212. That video is preferablyin a compressed format, such as MPEG-1, MPEG-2, or MPEG-4, althoughuncompressed digital video could be provided to the server, withcompression occurring thereon. The server 210 includes software writtenin any available programming language for generating a utility function,in the form of utility-based descriptors, based on the received video.In accordance with the present invention and described in further detailbelow, that descriptors is indicative of certain modifications to thecompressed video, e.g., through the elimination ofbidirectionally-predictive (“B”) frames or coefficients, that willresult in predetermined levels of quality. The compressed domain videoand associated utility function are delivered over a transit network220, such as the Internet or an intranet of sufficient bandwidth totransmit the compressed video. The transmitted information is receivedby a network computer 230, which in turn serves as the video adaptationengine of the system.

In particular, the network computer 230 includes software, again writtenin any available programming language, to adapt the incoming compressedvideo to the particular bandwidth requirements of several client devices250, 251, 252, 253 that are served by associated access networks 240. Inaccordance with the present invention and described in further detailbelow, the network computer 230 uses the utility-based descriptorsgenerated by the server 210 to adapt the compressed video to suchbandwidth requirements. Further, the network computer 230 may receivepreference information 241 from the client users, and/or availablebandwidth information 242 from the network, in order to optimize itsadaptation operation.

The access network 240 may be the Internet, an intranet, or aproprietary network such as a wireless network that links mobilecellular user terminals 253 to the network computer 230. In theapplication of video streaming over bandwidth-limited network, the bitrate of a video stream to be delivered is adapted to time-varyingbandwidth by an adaptation tool in real-time.

In a preferred arrangement, a combination of frame dropping andcoefficient dropping are used by the server computer 210 for theadaptation of nonscalable video to dynamic bandwidth. However, thoseskilled in the art should appreciate that other transcoding techniquescould be utilized to adjust the bit-rate of video streams for dynamicbandwidth adaptation, such as re-encoding, re-quantization of DCTcoefficients, object-based transcoding, and image-size reduction. TheFine-Granular-Scalability (“FGS”) and some of its variant forms thathave been adopted as new scalable coding tools in MPEG-4 also enable thedynamic adaptation of a FGS-stream to time-varying bandwidth byselecting appropriate number of bitplanes of scalable streams.

Frame and coefficient dropping are simple ways of rate adaptation withlow computational complexity since they involve the truncation ofportions of a bit sequence corresponding to particular frames andsymbols of DCT coefficients to be dropped by a compressed-domainprocessing. Further, for the application of video streaming over mobilewireless networks, they are more suitable for low delay real-timeoperation that is strongly required in a trancoding proxy.

Moreover, the combination of frame and coefficient dropping enables theadaptation of the rate of a video stream by adjusting spatial andtemporal qualities: frame dropping adjusts frame rate by dropping someframes, and coefficient dropping adjusts spatial quality by droppingsome of DCT coefficients that are associated with higher frequencycomponents. The dynamic range of rate reduction is increased bycombining two or more transcoding methods.

Frame dropping will next be descried. Frame dropping is a typical kindof temporal transcoding that adjusts frame rate by dropping some framesfrom an incoming video stream. It is often used for rate adaptation tobandwidth variations in video streaming applications because of itsefficiency and simplicity. One factor should be considered is theselection of the frames to be dropped. For example, when an intra-codedframe (an “I frame”) or certain predectivly coded frames (a “P” frame)are dropped, frames that refer to the dropped frame need to bere-encoded.

Thus, it is preferred that only B frames and/or P frames that do nothave a decoding dependency are dropped, in the unit a group of pictures(“GOP”), by taking into account the sequence structure of an incomingvideo stream. Frame dropping provides only a coarse approximation to thetarget rate since the smallest unit of data that can be removed is anentire frame. Therefore, the possible frame dropping operations aredefined by specifying the frame type to be dropped rather than thereduction rate to be achieved by the dropping.

For a GOP having the sub-group of 3 pictures between anchor frames(M=3), a set of frame dropping operations depending on an assumed GOPstructure may be defined as follows: no frame dropping; one B framedropped in each sub-GOP, all B frames dropped, and all B and P framesdropped, resulting in an I frame only sequence. For a GOP having asub-group of one I picture between two successive anchor frames (M=1),it is assumed that P frames are dropped from the end of each GOP, suchas the last P frame dropped, the two last P frames dropped, etc. throughall P frames dropped in each GOP.

Although the selection of the frames to be dropped frame is limited,this approach may be sufficient enough in terms of the amount of the bitrate reduction and quality alone, or may be combined with coefficientdropping (to be discussed below) in order to balance the desiredtemporal adaptation of frame dropping with the spatial adaptation ofcoefficient dropping. It should be noted that dropping frames may causeframe jerkiness since the dropped frames usually are replaced byprevious frames. In the first case of GOP structure that has more thanone picture between anchor frames (M >1), the defined transcodingoperations evenly distribute the dropped frames in the temporal rangeresults in more comfortable temporal quality. On the other hand, aspecial dynamic player is needed that can adjust the presentation timefor each decoded frame from the transcoded stream in the case of GOPwith (M=1) to reduce annoying effect cause by non-uniform frame droppingwithin a GOP.

Coefficient dropping will next be described. There are two fundamentalways in spatial adaptation that perform operations in the frequencydomain on DCT coefficients. The first is requantization, i.e., themodification of quantized coefficients by employing coarser quantizationlevels to reduce bit rate. The second is coefficient dropping in whichhigher frequency coefficients which are less important for image qualityare truncated. Coefficient dropping is preferred since it is moreamenable to fast processing than requantization, which requires theimplementation of recoding-type algorithms.

More specifically, assuming that a set of DCT coefficient run-lengthcodes at the end of each block is eliminated, the number of DCTcoefficient codes within each block that will be kept after truncationis called breakpoint. The breakpoint for each block may be determinedusing Lagrangian optimization, which minimizes the distortion caused bythe coefficient dropping while meeting the required target rate in aframe-by-frame basis. In the rate-distortion formulation of theoptimization, an algorithm which does not require memory can beemployed, with such an algorithm ignoring accumulated errors caused bymotion compensation and treating each picture as an intra-coded one dueto its simplicity. Ignoring the accumulated errors does not much affectthe quality and allows achieving essentially optimal (within 0.3 dB)performance.

In a given video segment and the target rate, we first assume uniformdropping that gives uniform rate reduction across different frames.Then, within a single frame, we perform the above optimal non-uniformdropping that gives different rate reductions with different breakpointsamong blocks, while meeting the target rate of the given frame.

Unlike frame dropping, in which the reducible rates are limited toseveral values since the smallest unit of data which can be removed isan entire frame, coefficient dropping provides the ability to meet andavailable bandwidth quite accurately within the upper bound of ratereduction by adjusting the amount of dropped coefficients. Preferably,only AC DCT coefficients are dropped in order to avoid somewhatcomplicated syntax changes that caused by when all coefficients aredropped, and to ensure a minimum necessary quality. The upper bound ofrate reduction depends on an incoming video stream. Numerouscoefficient-dropping operations may be defined by specifying thepercentage of rate reduction to be achieved, rather than directlyspecifying the dropped coefficients themselves. For example, theoperation of coefficient dropping (10%) represents a 10% reduction ofthe bit rate of incoming video stream by coefficient dropping.

The combination of frame and coefficient dropping is next described Forhigher bit rate reduction, frame dropping or coefficient dropping alonemay not be sufficient to accommodate available bandwidth. Moreover, onlya few discrete points can be achievable by frame dropping, while acontinuous rate adaptation is possible by using coefficient dropping.Therefore, the combination of frame dropping and coefficient droppingenables the extension of dynamic range of the reducible rate. Thecombination of both may also yield better perceptual quality than eithertechnique alone, especially for large rate reductions, by optimizing thetrade-off between spatial and temporal quality. For example, in order toreduce frame jerkiness at very low frame rates, temporal resolution canbe traded with the spatial quality while meeting the same ratereduction.

Referring next to FIG. 3., a two-dimensional adaptation space defined bythe combination of frame dropping and coefficient dropping is shown.Each point represents a frame dropping/coefficient dropping-combinedtranscoding operation. Note that the effect of the order of operationsshould be considered in the combination of coefficient and framedropping. For example, there are two combinations having differentorders of operation to achieve the same point 310: either 20%coefficient dropping followed by B frame dropping, or B frame droppingfollowed by 20% coefficient dropping. The results of both cases are thesame if rate-based uniform coefficient dropping in which the samerate-reduction is applied across the frames is employed. However, in thecase that different reduction ratios are assigned among frames toachieve global optimum coefficient dropping based on rate allocation,different operation orders may result in different results of reducedrate and quality. While the present disclosure is directed to theformer, the present invention contemplates both scenarios.

The generation of a utility function will next be described. In general,the relationships among the adaptation space, resource space, andutility space shown in FIG. 1 can be modeled based on a utilityfunction. A utility function may be defined as a media quality metricrepresenting a user satisfaction index as a function of resources. Inthe context of the present invention, the adaptation space is atwo-dimensional space specifying combinations of frame dropping andcoefficient dropping, the resource space includes the availablebandwidth varied with time, and the utility space includes the signal tonoise measure of the transcoded video stream.

Referring next to FIG. 4, an exemplary utility function generated by acombined frame dropping/coefficient dropping transcoding method appliedto previously stored MPEG-4 compressed video data, “coastguard” coded at1.5 Mbps and adapted over a bandwidth range less than 200 kbps, isshown. FIG. 4 is a graph plotting target rate in kbits/sec against PSNR,and illustrates four curves 420, 420, 430, 440 that represent therelationship between the target rates and PSNR qualities, each for adifferent adaptation operation within the exemplary utility function.

In the example, four different frame-dropping operations and six typesof coefficient dropping operations are utilized. The frame droppingoperations consist of no frames dropped, one B-frame dropped in eachsub-GOP, all B-frames dropped, and all B- and P-frames dropped. The sixcoefficient dropping operation are set at 0%, 10%, 20%, 30%, 40%, and50% reduction of the bit rate of the original test video stream. In thisway, there are 23 combined operations, which employ differentcombinations of the defined frame dropping and coefficient droppingoperations. Those 23 operations are shown as discrete points in curves420, 430, 440, and 450, which illustrate the set of points for thevarious coefficient dropping operations when no frames are dropped 420,one B-frame dropped 430, all B-frames are dropped 440, and all B- andP-frames are dropped 450, respectively.

FIG. 4, also illustrates a re-encoding curve 410, obtained by a cascadedfull-decoding and re-encoding, and thus may be considered as a referencefor performance comparison of the transcoding operations. It isimportant to note that for a given target bandwidth, there are multipleadaptation operations satisfying the same target rate. The optimaloperation with the highest video utility is selected.

As shown in FIG. 4, the utility function depends on the type of videocontent, the chosen coding parameters of an incoming video stream, andthe applied transcoding method. Given video segments which share thesame content type and transcoding method, the generation of a utilityfunction requires the repetitive computation of PSNR quality and ratefor a family of defined adaptation operations by testing all possibleoperations.

Utility function generation for live video will next be described. Forpreviously recorded video, the utility function can be generated byoff-line processing in a server in which computational cost is not aconcern, such as in the case of FIG. 4. However, this option isgenerally not an acceptable solution for live video, given the reed forsuch extensive repetitive computation. Accordingly, a content-basedutility prediction solution may be used to predict a utility function inthe case of live video.

In general, video can be mapped to distinctive utility distributionclasses prepared in advance based on computable content features, suchas motion activity and spatial activity extracted from compressedstreams. Accordingly, a utility function corresponding to an expectedincoming stream of video is prepared in advance for live video.

Forming a prediction for a live utility function is a two step process.First, an adaptive content classification loop is employed; second, areal-time estimation path is utilized. A set of utility functions thatcan cover entire types of content are generated and classified in theadaptive content classification loop off-line. Later, when a live videostream is received, the real-time estimation path selects an associatedutility function for each video segment in order to preserve the samecontent type in real-time.

The description of utility functions will next be described. In theutility-based framework, the utility function that representsdistributions of the adaptation, resource, and utility spaces, as wellas the relationships among them, are delivered along with an associatedvideo stream to an adaptation engine, e.g., located on network computer230. The main goal of the descriptor is to describe the distributions ofthe three spaces (adaptation, resource, and utility) and therelationships among them to support various types of usage scenarios inan efficient way. The descriptor should provide sufficient informationto the adaptation engine as to what are possible adaptation operationssatisfying constrained resources and the associated utilities.

In order to describe a utility function such as the of FIG. 4, the rangeof bit rates are sampled into a finite set of points, and then allfeasible frame dropping-coefficient dropping-combined operations capableof achieving the resource and the associated values of PSNR aredescribed using the sampled resource points as indexes. In general, afinite set of points over the multi-dimensional resource. space isdefined as indexes in the description.

Linear or non-linear sampling of the resource space can be selecteddepending on the characteristics of distributions of adaptation space bytaking into account the efficiency of description as well as the numberof sampling points. Interpolation between two consecutive points ofresource and corresponding adaptation operations and utilities may alsooccur in a linear or non-linear manner. In the case of adaptation,however, it should be noted that interpolation is not feasible betweendifferent frame dropping operations unlike coefficient dropping.

By specifying a particular adaptation method, a constrained resource,and utility according to intended applications, the descriptor cansupport most cases of resource-constrained scenarios.

Some adaptation operations may not be uniquely defined in terms ofquality. For example, an operation of “10% reduction of bit rate bydropping DCT coefficients in each frame (represented as coefficientdropping (10%))” does not specify the exact set of coefficients to bedropped. Different implementations may choose different sets and resultin slightly different utility values. As a result, the value of utilityassociated to a particular operation is not reliable.

On the other hand, some adaptation methods do not cause the ambiguityissue due to their unambiguous representation formats in terms ofadaptation. For example, some scalable compression formats, such asJPEG-2000 and MPEG-4 FGS, provide unambiguously defined scalable layers.Subsets of the layers can be truncated in a consistent way with the sameresulted quality as long as the decoders are compliant with thestandards.

Quality ranking may be employed in order to address this ambiguityissue. In some applications, the absolute values of utility of eachadapted media are not important, but instead, the relative ranking ofsuch values among different adaptation operations satisfying the sameresource may be critical. In such cases, the likelihood of achievingranking consistence is higher than absolute value consistence. In thissense, the descriptor describe the ranking instead of the utility valueto provide the notion of quality even the case that the quality valuesare not reliable due to the ambiguity. In addition, the descriptors mayinclude a flag to represent whether or not the ranking is consistentover different implementations. Assuming there exists some consistenceamong practical implementations, the empirical value of the flag may beobtained.

Referring next to FIGS. 5( a)-(c), the variation of utility functionsresulting from different implementations of coefficient dropping toobtain the value of the consistency flag is shown. FIG. 5( a) is areproduction of FIG. 4; FIG. 5( b) shows the same curves applied to thesame data, except that macroblock optimizing is selected; and FIG. 5( c)again shows the same curves applied to the same data, except with purerate-based uniform coefficient dropping is used, without optimizationacross blocks.

As shown in FIGS. 5( a)(c), there are noticeable variations of utilityvalues among utility functions with different implementations. There maybe several operations with different qualities achieving the same bitrate. In some parts of the bit rate range excepting the range covered byshaded box in FIG. 5( c), the rank of such equal-rate operations interms of quality is consistent among different implementations. Even inthe shaded box, there is constancy of rank depending on operations.Namely, the operation of all B frame coefficient dropping andcoefficient dropping has the worst utility regardless ofimplementations. Based on this observation, the descriptor describes therank and the optional flag for each operator to represent theconsistency of the rank fully.

Referring next to FIG. 6, an exemplary utility-based descriptor isshown. The descriptor provides a set of adaptation descriptors 610, eachof which describes a utility function associated with an adaptationmethod by including elements of resource, utility, and utility functionThe descriptor enables the selection of a defined adaptation methodaccording to the intended scenario by specifying one of the enumeratedones by an attribute; such as combined frame and coefficient dropping.

The Resource 620 and Utility 630 descriptors define constrained resourceand utility associated to the utility function 640 to be described interms of name and unit, respectively. Especially, multipleinstantiations of the Resource field 620 are allowed to accommodate themulti dimensional resource space. The UtilityFunction descriptor 640represents a set of possible adaptation operators and associatedutilities as function of resource points.

Referring next to FIG. 7, an exemplary UtilityFunction descriptor 640 isshown. The UtilityFunction descriptor 640 includes a set ofResourcePoints 710, each of which includes a set of AdaptationOperators720 to describe all possible adaptation operations satisfying thesampled values of constrained resources that are described byResourceValues 730. A particular adaptation operation of the specificadaptation method is described by selecting corresponding element. Forexample, FrameCoeffDropping 740 may be used for describing a particularoperation of frame dropping/coefficient dropping-combined transcoding byspecifying the type and number of frame to be dropped and the percentageof bit rate to be reduced by truncating coefficients. As noted above,other operations may be used, such as WaveletReduction 750 in order todescribe the specific operation of wavelet reduction by specifying thenumber of levels and bitplanes to be truncated. The adaptation operatorFGS 770 may be used to describe the specific operation of an MPEG-4 FineGranularity Scalability (“FGS”) stream by specifying the number ofbitplanes of FGS-frames, and/or the number of bitplanes of FGST-framesto be truncated from the enhancement layer.

In addition to adaptation operation, the associated utility value isdescribed by the UtilityValue 760. Where adaptation method is subject toambiguity in specifying the adaptation operation, theUtilityRankInformation 761 is instantiated instead of the UtilityValueto describe the rank of the associated operation with an optionalattribute of a consistency Flag representing the consistence of therank.

The foregoing merely illustrates the principles of the invention.Various modifications and alterations to the described embodiments willbe apparent to those skilled in the art in view of the teachings herein.It will thus be appreciated that those skilled in the art will be ableto devise numerous systems and methods which, although not explicitlyshown or described herein, embody the principles of the invention andare thus within the spirit and scope of the invention.

1-48. (canceled)
 49. A method of delivering varying qualities ofaudiovisual content to one or more users, comprising: for each user,determining a level of quality constraint; receiving multimedia contentto be transcoded; using a computing apparatus, for each user, in realtime, transcoding the multimedia content to meet the level of qualityconstraint for the respective user; delivering the transcoded multimediacontent to the respective user.
 50. The method of claim 49, whereintranscoding includes spatial domain adaptation of the multimediacontent.
 51. The method of claim 49, wherein transcoding includestemporal adaptation of the multimedia content.
 52. The method of claim49, wherein transcoding includes object-based adaptation of themultimedia content.
 53. The method of claim 49, wherein transcodingincludes frame dropping.
 54. The method of claim 49, wherein transcodingincludes coefficient dropping.
 55. The method of claim 49, wherein thelevel of constraint includes spatial resolution.
 56. The method of claim49, wherein the level of constraint includes computational capability.57. The method of claim 49, wherein the level of constraint includesbandwidth limitations.
 58. The method of claim 49, wherein transcodingthe multimedia content comprises two or more transcoding operations. 59.A system for delivering varying qualities of audiovisual content to oneor more users, comprising: an application receiving, for each user, alevel of quality constraint; An application receiving multimedia contentto be transcoded; an application transcoding the multimedia content tomeet the level of quality constraint for each user in real time; anetwork application delivering the transcoded multimedia content to therespective user.
 60. A method for adapting multimedia content to meetlevel of quality constraints in real-time, comprising: receiving one ormore quality constraints from a user; receiving multimedia content to betranscoded; determining, using a computing apparatus, in real-time, oneor more adaptation operations to be applied to the multimedia content tomeet the one or more quality constraints; and applying the one or moreadaptation operations to meet the one or more quality constraints. 61.The method of claim 60, wherein adaptation includes spatial domainadaptation.
 62. The method of claim 60, wherein adaptation includestemporal adaptation.
 63. The method of claim 60, wherein adaptationincludes object-based adaptation.
 64. The method of claim 60, whereinadaptation includes frame dropping.
 65. The method of claim 60, whereinadaptation includes coefficient dropping.
 66. The method of claim 60,wherein the level of constraint includes spatial resolution.
 67. Themethod of claim 60, wherein the level of constraint includescomputational capability.
 68. The method of claim 60, wherein the levelof constraint includes bandwidth limitations.
 69. The method of claim60, wherein applying the adaptation operation comprises applying two ormore adaptation operations.
 70. A system for adapting multimedia contentto meet level of quality constraints in real-time, comprising: anapplication receiving one or more quality constraints from a user; anapplication receiving multimedia content to be transcoded; anapplication determining, in real-time, one or more adaptation operationsto be applied to the multimedia content to meet the one or more qualityconstraints; and applying the one or more adaptation operations to meetthe one or more quality constraints.